Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on usage of Maps.transformValues #2518

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

jakobjuelich
Copy link
Contributor

Before this PR

After this PR

==COMMIT_MSG==
Warn on usage of Maps.transformValues
==COMMIT_MSG==

Possible downsides?

@changelog-app
Copy link

changelog-app bot commented Mar 13, 2023

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Warn on usage of Maps.transformValues

Check the box to generate changelog(s)

  • Generate changelog entry

Copy link
Contributor

@carterkozak carterkozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree with this -- it's not so much an unexpected detail as the way that a common guava utility function works (and is documented). In most places that I'm aware of where this is used, this behavior is relied upon to avoid producing expensive intermediate objects.
We generally add checks like this when behavior has side-effects or edge cases that are wildly unexpected and dangerous, and the vast majority of usages in our codebases are provably unsound. I don't think this meets the bar for forcing hundreds or even thousands of code changes to be added.

@jakobjuelich
Copy link
Contributor Author

I hear your point. It is documented, yet ambiguously named and easy to miss. It just caused a pretty bad P0 last week and I know of at least one more time it was erroneously used. Sure, Errorprone can't safeguard against all the possible bugs, but I do expect this to happen more often and wonder what we can do.

I'm trying to think of a way to make it more specific, but struggle. Maybe, i.e., if we scan for places where transformValues is directly assigned to a variable of type Map, that'd narrow it down a little.

Do you have any advice?

Copy link
Contributor

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be an ERROR level check as there is no automated fix (nor should there be), and this change would fail on rollout to many repos requiring broad suppressions or disabling of the check.

@BugPattern(
link = "https://github.com/palantir/gradle-baseline#baseline-error-prone-checks",
linkType = BugPattern.LinkType.CUSTOM,
severity = SeverityLevel.ERROR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the issue that spurred this PR, I think this should be a SUGGESTION as there are many uses of Maps#transformValues that I would not want to robotically replace with eager transformations.

There are many places where we very intentionally use Maps.transformValues to avoid intermediate collection copies (due to both allocations, CPU, and memory tradeoffs) in places where we're handling large volumes of transformations. Some public examples would be AtlasDB's use throughout much of its data processing.

private static final String ERROR_MESSAGE = "The transformValues API of Guava Maps creates a lazily evaluated "
+ "view of the source Map. Repeated access of the same key leads to repeated evaluations of the "
+ "transformer function. This is often unintended and can cause severe performance degradation."
+ "Where this is actually intended, suppress this warning.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I'm hesitant to treat lazy transforms as something that completely needs to be avoided or surprised. If we were to do that, do we block all the other lazy APIs from Guava (e.g. Iterators/(Fluent)Iterables/Collections2/Lists/Sets/Maps transform* methods)? Do we forbid returning Stream from methods as these are lazy until terminal operation? Do we forbid all Guava APIs that return collection/map views?

I would argue no we should not forbid these to all of the above. That said, in general most places should probably prefer eager transforms by default (especially as it is simple to stream & collect into an Immutable* or unmodifiable collection), but I don't think it is an error to use these well documented APIs as they were designed.

implements BugChecker.MethodInvocationTreeMatcher {
private static final long serialVersionUID = 1L;
private static final String ERROR_MESSAGE = "The transformValues API of Guava Maps creates a lazily evaluated "
+ "view of the source Map. Repeated access of the same key leads to repeated evaluations of the "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how feasible this would be, but a more specifically targeted checker around reuse of lazily transformed collections might lead to higher signal warnings. Possibly couple this with checks identified RPCs in lazy evaluation might also be high signal (as those may be places that should be utilizing a batch API if available).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants